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ABSTRACT 

Credibility  theory  is  the  name  given  by  American  actuaries  to 
linear  estimation  formulae  developed  to  experience-rate  insurance 
premiums.  These  formulae  can  be  viewed  as  linear  Bayesian  forecasts 
of  a conditional  mean,  exact  under  certain  conditions,  and  best  least- 
squares  approximations  otherwise.  This  paper  surveys  the  recent 
theoretical  developments  in  the  actuarial  literature,  relates  these 
results  to  other  linear  estimation  methods,  and  describes  a variety 
of  special  models  and  applications. 
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A SURVEY  OF  CREDIBILITY  THEORY 
by 

Uilllam  S.  Jewell 

1.  INTRODUCTION 

Credibility  theory  Is  the  name  given  by  American  actuaries  to  heuristic 
linear  estimation  formulae  developed  in  the  1920’s  for  insurance  rate-making 
problems.  These  results  and  their  recent  extensions  are  not  only  useful  in 
practice,  but  have  interesting  relationships  with  other  estimating  and 
forecasting  methods,  such  as  the  classical  formula  for  the  combination  of 
observations  due  to  Gauss,  maximum-likelihood  estimators,  Bayesian  esti- 
mation, and  linear  filter  theory.  Credibility  forecasts  can  be  viewed  as 
linear  Bayesian  forecasts  of  a conditional  mean,  exact  under  certain  con- 
ditions, and  best  least-squares  approximations  otherwise. 

Many  new  theoretical  results  and  special  models  have  appeared  in  the 
actuarial  literature;  credibility  theory  was  the  theme  of  a recent  actu- 
arial research  conference  [38].  In  this  paper,  we  shall  survey  these 
results,  and  relate  them  to  linear  estimation  results  from  other  fields. 
Other  noninsurance  application  of  credibility  will  also  be  described. 
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2.  THE  BASIC  CREDIBILITY  FORMULA 

In  the  original  Insurance  experience  rating  problem  which  give  rise  to 
credibility,  we  consider  a collective  of  similar  but  somewhat  heterogeneous 
insurance  contracts  which  are  grouped  together  to  "spread  the  risk."  It  is 
assumed  that  detailed  prior  statistics  are  available  from  this  pool;  in 
particular,  the  manual  or  collective  fair  premium,  m - E{x}  , is  the 
average  value  of  the  risk  random  variable  of  interest,  x , such  as  number 
of  accidents  per  year,  total  dollar  claims  per  unit  exposure,  etc. 

Now  suppose  a new  insurance  contract  of  unknown  risk  characteristics 
is  underwritten,  and  assigned  to  this  pool.  At  the  beginning,  the  indi- 
vidual fair  premium  charged  would  be  just  the  collective  premium  m ; how- 
ever, as  n years'  individual  experience  data  [x^x^,  ....  xn]  is 
obtained  on  this  risk,  it  seems  reasonable  that  the  individual  sample  mean, 
x ■ £ x£/n  , would  tend  to  reflect  more  nearly  the  risk  characteristics  of 
the  individual,  except  for  the  large  variability  in  x with  small  n . 

Using  heuristic  reasoning  on  the  pooling  of  data  (and  considering  only 
the  number  of  claims  per  year),  the  early  actuarial  literature  argued  for 
an  experience-rated  fair  premium  for  next  year's  risk,  xn+1  * the  form 

(2.1)  E(xn+i  | x1,x2,  ....  xr}  25  (1  - Z)m  + Zx  , 


with 


(2.2) 


Z 


n 

n + N ‘ 


Z was  called  the  credibility  factor’,  it  mixes  the  manual  premium,  m , 
and  the  experience  premium,  x , with  increasing  "credibility"  attached  to 
the  latter  as  n increases.  The  time  constant  N was  essentially 


...  . I I 


determined  by  trial  and  error  for  different  types  of  insurance.  This 
credibility  formula  was  successfully  used  in  American  casualty  insurance 
rate-making  for  more  than  50  years,  with  innumerable  variation  and  elabo- 
ration. A survey,  with  references,  may  be  found  in  Longley-Cook  [41];  see 
also  [22]  and  [23]. 
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3.  THE  BAYESIAN  APPROACH 


The  modern  development  of  credibility  theory  beglne  with  the 
resurgence  of  interest  in  Bayesian  ideas  in  the  1950' s,  and  with  the 
works  of  Bailey  I 2 ],  [ 3 ] and  Mayerson  [42],  who  showed  that  the 
experience-rating  problem  could  be  formulated  as  finding  a Bayesian 
conditional  mean,  as  already  implied  by  the  notation  in  (2.1). 


Let  each  member  of  the  risk  collective  be  characterized  by  a 
(scalar-  or  vector-valued)  risk  parameter  0 ; the  heterogeneity  of 
the  collective  is  then  described  by  a prior  density  u(0)  , from  which 
each  risk  draws  an  independent  sample.  Given  6 , the  distribution  of 
an  individual's  risk  variable  for  one  year,  x ■ x , is  given  by  a 
likelihood  density,  p(x  | 0)  ; on  an  individual  basis,  the  fair  premium 
is 

(3.1)  m(0)  - E(x  | 0}  - J xp(x  | 0)dx  , 
and  the  individual  variance  is 

(3.2)  v(0)  - l/(i  | 0}  - /(x  - m(0))2p(x  | 0)dx  . 

The  pooled  statistics  from  the  collective  of  risks,  however,  have 
a mixed  collective  density  p(x)  ■ Ep(x  | 0)  , and  this  implies  the 
collective  fair  premium  is: 

(3.3)  m ■ E{x)  ■ Em(0)  , 
with  total  collective  variance 

(J.4)  v - !/{*}  - E + D ; E - Ev(0)  ; D - Vm(0)  . 

Using  standard  Bayesian  arguments , the  exact  experience-rated  fair 


i 

$ 

j 


premium  is 


x } ■ E{m(0)  | x. ,x 


where  we  have  assumed  that  each  successive  year's  experience  is 


independent,  for  a given  (constant)  6 . The  term  in  square  brackets 


is  the  posterior-to-data  density  of  6 for  this  risk 


Bailey  and  Mayerson  showed  that  the  exact  result  (3.5)  could  be 


rearranged  into  the  credibility  form  (2.1)  for  the  special  prior 


likelihood  combinations:  Beta-Binominal,  Gamma-Poisson,  Gamma-Exponential 


and  Normal-Normal  (known  variances),  m was  calculated  by  (3.3),  and 
N in  (2.2)  was  a function  of  the  hyperparameter8  of  the  prior,  u(6) 
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4.  LEAST-SQUARES 

The  next  step  in  the  development  of  credibility  was  through  least- 
squares  theory.  Suppose  we  have  a vector-valued  random  variable  x from 
whose  observations  x we  are  trying  to  predict  a scalar  random  variable 
y through  a forecast  function  f(x)  . Assuming  we  know  the  Joint  dis- 
tribution P(y,x)  - Pr(y  i.y;xf.x}  , the  classical  means  of  evaluating 
any  f is  the  mean-square  error  norm; 


(4.1) 


*-/o  - f (x) ) 2dP (y  ,x)  . 


It  Is  known  that  the  integrable  function  f which  minimizes  (4.1)  at 


value  1°  is  the  conditional  mean: 


(4.2) 


f°(x)  - E{y  | x - X)  . 


In  Bayesian  terminology,  the  conditional  mean  minimizes  quadratic  Bayes' 
risk. 

In  many  cases  the  exact  conditional  calculation  (3.5)  is  too  difficult 
and  an  approximate  forecast  function  is  sought.  Since  completion  of  the 
square  shows  that 


(4.3) 


I - 1°  + / (f°(x)  - f (x))2dP(x)  , 


Iw  - EV{y  | x>  - V{y)  - Vf° (x)  , 


then  any  approximate  forecast  f can  be  evaluated  in  terms  of  a least- 
squares  fit  to  the  conditional  mean  over  the  observation  space. 

A typical  choice  of  an  approximate  forecast  Is  a linear  function 


(4.4) 


*<*>  - *0  + I » 


where  the  parameters  are  adjusted  to  minimize  (4.1)  or  (4.3).  It  Is  well 
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; 


[ 


known  that  the  optimal  value  £ of  the  vector  a_  • [a^  | j i 0}  la 
given  by  the  "normal"  system  of  equations 


(4.5) 


Ca  - l>  , 


* * 
with  a selected  so  as  to  make  the  optimal  linear  forecast  f 
o 

unbiased,  e.g., 


(4.6) 


Ef*(x)  - E{y)  ; a*  - E{y}  - a*'m  ; 


and  the  covariance  matrix  C and  the  vectors  b and  m are: 


(4.7) 


C - l/{x)  ; b » C{x;y>  ; m - E{x} 


The  prior  variance  of  the  optimal  linear  forecast  is: 


(4.8)  l/f*(x)  - a* ' Ca*  - a* ' b - b'c'Ho  - C{f*(x);y} 


giving  minimal  approximation  mean-square  error: 


(4.9) 


I*  - 1°  - Vf° (x)  - b’c'H  , 


which  is  smaller,  the  closer  the  conditional  mean  Ely  | x}  is  to  a 

* 

linear  form.  In  this  sense,  the  optimal  linear  f (x)  is  a best  least- 
squares  linearized  Bayesian  approximation. 

In  1967,  Biihlmann  [4],  [5]  showed  the  Important  result  that,  for 
the  collective  model  of  Section  3,  the  optimal  linear  estimator  for  the  ex- 
perience-rated fair  premium  is  exactly  the  credibility  form  (2.1),  provided 


ror  any  two  random  vectors  or  scalars  £ and  , we  define  the 
(possibly  nonsquare  and  unsymmetrlc)  covariance  matrix:  C{£;<v}  ” 

Etyv'}  - E{£}E{£' } , and  call  C{£;£}  ■ V{£}  , the  usual  covariance 
matrix  of  ii  on  Itself. 
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that  the  tine  constant  In  the  credibility  factor  is  chosen  ss  the  ratio 
of  the  components  of  the  collective  variance  (3.4),  i.e., 

(4.10)  N - Ev(0)/Vm(0)  - E/D  . 

This  shows  that  the  basic  credibility  formula  is  robust,  and  has 
mean-square  error  (estimation  error  variance) 

(4.11)  1*  - E + (1  - Z)D  , 

which  shows  clearly  how  increasing  experience  data  Improves  the  estimate. 

The  sample  mean  alone,  x , is  a poorer  estimate  because  it  has 

-1  * 

I « (1  + n )E  , which  is  always  larger  than  I . However,  if  the 

prior  variation,  D * l/m(0)  , is  very  large  compared  to  E ■ Ev(0) 

(a  "diffuse  prior")  then  N is  very  small,  and  f and  x are 

practically  the  same. 

Notice  the  Important  result 

(4.12)  C{y  - f*(x);f*(x)>  - 0 ; 

that  is,  any  error  remaining  in  the  optimal  forecast  is  uncorrelated 
with  the  predictor. 
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5.  EXACT  CREDIBILITY 

In  1973,  Che  author  showed  chat  Che  class  of  likelihood-prior 
families  for  which  Che  credibiliCy  approximation  (2.1),  (2.2),  (4.10) 
was  exactly  Che  Bayesian  conditional  mean  could  be  extended  [25], 
127). 

Consider  the  Koopman-Pitman-Darmis  exponential  family  of  likeli- 
hoods in  which  the  sample  mean  is  Che  only  sufficient  statistic  and 
natural  parametrization  is  chosen,  l.e.. 


(5.1) 


p(x  | 0)  » 


(*  e X) 


for  continuous  or  discrete  measure  in  a given  range  X , determined 
by  the  nonvanishing  of  a(x)  . c(6)  is  a normalising  factor  to  make 


/ 


p(x  | e)dx 


l . 


The  natural  conjugate  prior  corresponding  to  the  likelihood  (5.1) 
is 


(5.2) 


(e  c e) 


defined  over  a natural  parameter  epace , 0 , for  which  (5.1)  is  a density, 

i.e.,  for  all  values  of  0 for  which  c(0)  is  finite.  Restrictions 

on  the  hyperparameters  (nQ,xQ)  may  be  necessary  to  make  (5.2)  a density 

as  well,  i.e.,  to  make  the  normalization  d(n  ,x  ) finite,  tfe  shall 

o o 

henceforth  assume  n > 0 . 

o 

The  advantage  of  a natural  conjugate  pair  is  that  the  family  is 
closed  under  sampling , that  is,  the  density  of  0 posterior  to  the  data 
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Is  of  che  same  fora  aa  (5.2),  with  hyper parameter  updating: 


n *■  n + n 
o o 


(5.3) 


x ♦ x 
o o 


•• 

♦ 1 \ 


Since  m(0)  - -c'(0)/c(8)  for  this  family,  integration  of  (5.2)  by 
parts  gives 


(5.4) 


Em(6) 


c(0)  is  analytic  in  the  Interior  of  6 , and,  in  most  cases  of  interest, 
vanishes  at  the  boundary  as  well,  making  the  first  term  on  the  RHS  of 
(5.4)  zero.  The  precise  regularity  conditions  under  which  this  happens 
are  complex,  and  are  covered  in  [27]. 

Assuming  these  conditions  are  satisfied,  (5.3)  then  Implies 


(5.5) 


E{m(0)  I x 


x + l x„ 

x x > . 2 t 
Vx2 V n + n 


The  final  steps,  which  are  similar,  show  that  x / n ■ m , and  n 

o o o 

is  the  time  constant  N (4.10)  of  Btihlmann,  thus  proving  credibility 
is  exact  for  (5.1),  (5.2). 

Additional  examples  beyond  those  of  Bailey  and  Mayerson  are  given 
in  [25],  and  the  extension  to  credibility  mixing  of  more  general 
statistics  for  arbitrary  exponential  families  is  also  demonstrated. 
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6.  MULTIDIMENSIONAL  CREDIBILITY 

The  previous  results  are  easily  extended  to  prediction  of  a multi- 
variate conditional  mean  [23],  [26].  Let  x be  a p-dimenslonal  random 
variable,  depending  upon  a risk  parameter  6 through  a likelihood 
density  p(x  | 6}  ; a prior  density  on  6 is  assumed  known.  For 
t - 1,2,  ...,  n , we  observe  n Independent  realisations,  - x^  , 
of  this  random  variable  for  a fixed  6 . The  problem  is  to  make  a 
(vector)  forecast  f(X)  of  the  next  observation,  E{xn+1  I X)  » 
where  X is  the  p x n matrix  of  data  {xt  | t - 1,2,  ....  n)  . 

From  the  likelihood  and  the  prior,  we  calculate  the  vector  means: 

(6.1)  m(0)  ■ E{x  | 0}  ; m - Em(0)  ; 
and  then  the  two  p * p covariance  matrices; 

(6.2)  E - El/{x  | 0}  , 


! 


and 

(6.3)  D - l/{m(0)>  . 

The  total  covariance  matrix  of  any  on  itself  is  E + D , but  the 

covariance  matrix  between  any  xfc  and  x^  (t  i*  u)  is  just  D . 

Assuming  that  the  forecast  of  each  component  of  xn+1  is  linear 
in  all  the  data  X , then  the  use  of  least-squares  theory  gives  after 
some  algebra  the  multivariate  credibility  formula: 

(6.4)  E{*n+1  I X}  * f*(X)  " (I  " Z)2  + Zi  » 


where  I is  the  p x p identity  matrix,  and  x is  the  vector  of 
sample  means. 
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x - I xt/n  . 

The  p x p credibility  matrix  Z satisfies  formulae  analogous  to  the 
one-dimensional  result  (2.2): 

(6.5)  Z - n(N  + nl)-1  ; (I  - Z)  ■ — (ZN)  - - (NZ)  ; 

n n 

and  the  p * p matrix  of  time  constants,  N , is  analogous  to  (4.10): 

(6.6)  N • ED*"1  . 

If  the  eigenvalues  of  N are  {v^>  , then  those  of  Z are 

(n/(n  + v )}  ; one  can  show  that  in  the  nondegenerate  case  11a  Z ■ I . 

n-x*> 

In  other  words,  the  initial  forecast  (no  data)  is  the  prior  mean  m ; 
successive  forecasts  utilize  linear  mixtures  of  all  sample  means  in 
varying  proportions;  but  ultimately,  each  component  of  the  risk  is 
estimated  only  through  its  own  sample  mean,  as  n -►  » . Specific 
examples  are  given  in  [23].  The  p x p "preposterior"  estimation 
error  covariance  matrix  for  the  optimal  vector  forecast  in  (6.4)  is: 

(6.7)  ^in+l  “ £*<X>}"E  + (I  - Z)D  • 

This  is  similar  to  (4.11),  but  of  course  only  the  diagonal  elements 
of  the  LHS  of  (6.7)  were  minimized  in  selecting  the  optimal  coefficients. 

There  are  also  exact  multi-dimensional  results  [26],  corresponding 
to  Section  5,  for  the  linear  multivariate  exponential  family  likelihood'. 

a(x)ex  p{-6'x} 

(6.8)  p(x  | 6) ~ -(fl)  . (x  e X) 
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6 Is  now  a p-dimensional  vector  In  the  complete  parameter  space  0 for 
which  the  normalisation,  c(0)  , is  finite.  This  family  has  the  vector 
sample  mean,  x , as  a sufficient  statistic,  and  has  been  investigated 
somewhat;  however,  their  natural  conjugate  priors  have  been  little  studied. 

The  simplest  natural  conjugate  prior  for  (6.8),  analogous  to  (5.2),  is 

-n 

(6.9)  u(8)  - [ c (0 ) 1 °exp[-0'x  ] (0  e 0) 

— — — — o — 

where  the  scalar  nQ  and  the  vector  xq  are  hyperparameters;  we  assume 
always  u(0)  vanishes  at  the  boundary  of  0 . Embarrassingly,  in  this 
case,  although  (6.4)  is  exactly  the  conditional  mean,  Z degenerates  to 
a diagonal  matrix  because  N ■ nQI  , and  the  forecasts  for  each  component 
of  x are  independent! 

To  remedy  this,  the  author  develops  in  [26]  an  "enriched"  version  of 

(6.9) ,  for  likelihoods  (6.8)  in  which  a(x)  will  factor  into  a product 
of  p independent  components  when  x is  subject  to  a linear  transfor- 
mation. In  this  case,  enough  additional  hyperparameters  can  be  introduced 
to  make  N and  Z non-diagonal,  and  all  components  of  the  sample  mean 
are  used  in  prediction  of  any  one  future  value. 

An  important  extension  permits  quadratic  terms  in  the  exponent  of 
(6.8),  and  leads  to  the  well-known  multinormal  likelihood,  with  unknown 
mean  and  unknown  precision.  The  usual  mean-precision  prior  is  a normal- 
Wiehart  distribution,  due  to  Ando  and  Kaufman  [ 1 ] ; its  "thinness"  is  well- 
known  in  the  literature,  and  is  similar  to  the  degeneracy  described  above. 
Through  the  use  of  linear  transformation,  it  is  possible  to  extend  the 
Ando-Kaufman  prior,  again  giving  a full-dimensional  credibility  formula 
(6.4)  for  the  conditional  mean  [26].  One  also  finds  the  following 
interesting  credibility  formula  for  the  conditional  covariance  matrix  of 
this  enriched  multinormal: 
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7.  CLASSICAL  ESTIMATORS 

We  have  emphasized  the  Bayesian  role  of  credibility,  either  as  an 
approximation  to  a conditional  mean,  or  as  an  exact  result  for  certain 
priors  and  likelihoods.  However,  there  are  other  interpretations 
of  credibility  which  show  its  relationship  to  classical  least-squares 
estimators. 

Suppose  we  rewrite  (6.4)  for  the  case  in  which  only  one  p- 
dlmensional  observation  x ■ x^  is  made: 

(7.1)  E(x2  | x^  « (I  - Z1)m  + ZjXj  , 
with 

(7.2)  Z1  - (I  + ED"1)'1  . 

Rewrite  this  as: 

(7.3)  E{;2  | Xj^}  - m « Z^i^  - m)  , 
and  note  that  Z^  can  be  written  as: 

(7.4)  Zx  - D(E  + D)’1  - C(i2  ; . 

In  this  form,  we  recognize  (7.3)  as  a well-known  exact  result,  the 
regression  of  x2  on  x^  for  a joint  multinormal  distribution  of 
(^  ; x 2>  . 

For  the  second  interpretation,  write  in  linear  model  foms 


(7.5) 


^ - m(0)  + ^ 


(t  - 0,1,2,  ...) 


wh«r*  la  an  appropriate  error  variable.  Independent  of  0 and 
other  errora,  and 


(7.6) 


E{uJ  - 0 


(t  - 0,1,2,  ...)  . 


For  t - 1 , we  observe  x^  • x^  , and  this  Is  an  estimator  of 
£{*2  I “ f(«(6)  I x^)  with  error  covariance  matrix 


(7.7) 


l/{u1>  - EWu^  | 0}  - E . 


Nov,  a prior  estimate  can  be  thought  of  as  an  Initial  observation  at 
t • 0 , so  that  the  Initial  credibility  estimate  - m Is  also  an 
unbiased  estimate  of  m(0)  before  the  other  observations  begin. 
Since  m is  a constant,  (7.5)  shows  that  the  error  covariance  of 
this  estimate  is: 


(7.8) 


Wuq)  - Mm(0)  - D . 


By  elementary  manipulations: 


(7.9) 


E{m(0)  | Xq  - n.Xj)  » (E_1  + D_1)_1|D_1m  + E-1^  . 


and  we  recognize  that  the  "two"  observations  x^  and  x^  are 
combined  by  weighting  with  their  respective  preoieione,  D~*  and 
E . This  ancient  formula  for  the  combination  of  observations  is 
due  to  Gauss,  and  is  known  to  be  exact  for  u^  and  ju2  Independent 
and  (multi-)  normally  distributed.  The  preposterior  estimation  error 


the  sum  of  the  two  observation  precisions.  Similar  remarks  apply  to 
the  Bayesian  regression  model  of  Section  10;  see  (10.10)  and  [34]. 

Notice  that  credibility  formulae  are  mixtures  of  a prior  mean 
and  a classical  maximum  likelihood  estimator.  This  is  true  for  a 
large  class  of  identifiable  linear  models  (10.7),  and  in  the  exact 
case  follows  easily  from  the  definition  of  exponential  families. 

Finally,  we  note  that  many  articles  in  the  statistical  literature 
develop  similar  "wide-sense  conditional  expectations"  and  "parameter 
shrinkage"  formulae,  [14],  [40],  [17],  [18],  [20],  [48],  [53];  they 
are  called  "pseudo-Bayes  estimators"  in  filter  theory  [47],  and  are 
no  doubt  being  rediscovered  in  other  fields  as  well.  However,  the 
intimate  relationship  between  the  Bayesian  and  the  classical  appraoch 
is  not  well  appreciated,  and  there  is  widespread  belief  that  these 
results  are  exact  only  for  multinormal  distributions,  which  is  not 
true. 
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8.  OTHER  CREDIBILITY  MODELS 

Ve  now  consider  some  of  the  many  extensions  of  the  credibility 
model  developed  to  deal  with  specific  estimation  problems  In  Insurance. 
These  models  all  use  least-square  approximations,  usually  exploiting 
the  special  structure  to  avoid  maisrlcal  Inversion  of  the  covariance 
matrix  In  (4.5).  See  also  the  references  In  [13]  and  [38]. 

8.1  Other  Functions 


The  first  Ides  Is  that  least-squares  theory  also  applies  to 
functions  of  random  variables,  with  appropriate  modifications  to  (4.7). 
Suppose  In  the  basic  model  of  (2.1),  (2.2),  (4.10),  we  replace  x 
by  I(u  — x)  throughout;  here  u Is  some  fixed  value  In  the  range 
of  x , and  I Is  the  unit-step  function,  unity  for  nonnegative 
arguments,  zero  otherwise.  Since  £ I(u  - xt)  counts  the  number 
of  samples  not  greater  than  u , and  El(u  - x)  » P(u)  , we  get  a 
credible  conditional  distribution  forecast  [24]: 


(8.1) 


P(u  I x)  m (1  - Z)P(u)  + Z[lx  I(u  - xt)/nj  , 


as  a mixture  of  the  prior  collective  probability,  P(u)  , and  the 
experienced  sample  distribution.  Z ■ n/ (n  + N)  , as  before,  but  the 
time  constant  depends  upon  u: 


(8.2) 


s . - P.feU  . ! . 

I/P(u  | 0) 


(8.1)  is  exact  only  in  the  simple  case  of  Beta/Bernoulli  prior /likelihood. 
If  P(u)  Is  continuous.  It  gives  a mixed  function  of  u which  has 
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smaller  mean-squared  error  for  every  u than  using  the  sample  dis- 
tribution alone. 

~ ■ y f 3 

Clearly,  the  same  idea  also  applies  to  estimating  moments  of  x , 

the  difficulty  being  that  higher  collective  moments  corresponding  to 

(3.3),  (3.4)  must  be  known.  In  [ 5],  Buhlmann  develops  a credibility 

formula  for  the  conditional  variance  l/{x  . . | x.,x.,  ...,  x } 

n+x  l z n 

based  upon  separation  into  a "variance"  part  and  a "fluctuation" 
part,  and  using  several  approximations;  see  also  (6.10). 

Estimating  fractlles  or  order  statistics  by  credibility  seems 
difficult;  however  Buhlmann  [ 9 ] shows  that  one  can  estimate  the 
mass  between  any  two  ordered  data  points  of  given  rank. 

In  [11],  de  Vylder  has  studied  the  optimal  form  of  the  predictand 
to  be  used  in  the  semi-linear  form: 


(8.3) 


f(x)  - a.  + a l g (x  ) , 
u 1 t-1 


for  arbitrary  likelihood  and  prior.  The  resulting  integral  equation 
uses  the  conditional  density  p(x^  | x^)  to  find  the  optimal  g , 
and  seems  most  useful  for  discrete  x . [12] 


8.2  Compound  Models 


A basic  concept  in  casualty  insurance  is  that  the  total  dollar 
claims  in  a given  exposure  period  is  related  to  both  frequency  and 
severity  of  a claim,  once  it  occurs;  this  leads  to  a risk  random 
variable  which  is  the  random  sum  of  other  elementary  random  variables. 


The  major  contributions  are  [7],  [21],  [23],  [43],  [44]. 
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8 . 3  Bonus  Hunger 

It  is  a well-known  fact  that  experience  rating  schemes  induce 
a compensating  behavior  in  the  insured  individual.  For  example, 
small  claims  will  not  be  reported  in  order  to  keep  future  dividend 
payments  down;  Insurance  companies  often  encourage  this,  even  though 
it  makes  estimation  of  the  true  risk  more  difficult.  [45]  examines 
the  effect  of  this  "bonus  hunger"  on  credibility  plans. 


8.4  IBNR  Models 

Many  insurance  claims  take  a long  time  to  "develop,"  that  is, 
a claim  in  year  t will  incur  "losses"  in  year  t , t+1 , t+2 , ...  ; 
the  total  dollar  claim  is  "Incurred  but  not  (fully)  reported." 

Thus,  at  any  epoch  in  time,  one  has  an  IBNR  triangle  of  partially 
developed  claims  which  can  be  used  to  estimate  the  final  totals; 
correlations  in  observations  in  both  the  cohort  and  calender  time 
dimensions  are  likely.  These  problems  have  been  approached  by  Straub 
and  Kamrelter  using  least-squares  techniques  [39],  [49]. 

8.5  Time-Dependent  Models 

The  analysis  of  time-dependencies  is  of  great  Interest.  In  the 
general  nonstationary  case  where  n + 1 p-dimensional  densities, 
pt(xt  | 0)  , (t  - 1,2,  ....  n+1)  , are  available,  the  optimal  linear 
predictor  requires  inversion  of  an  np  * np  covariance  matrix,  which 
is  hardly  satisfactory.  However,  if  the  time-dependency  is  of  eeparable- 
mecm  type  [28],  that  is,  for  the  means: 
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(8.4) 


»lt(6)  - EUu 


• r . (0)  , /*  " 1»2 \ 

\t  - 1,2,  ....  n+iy 


(with  arbitrary  time  dependence  of  the  covariance) , then  one  can  show 
that  only  matrices  of  order  p * p need  be  inverted. 

Another  modeling  approach  is  to  consider  that  the  risk  parameter 
is  itself  changing  over  time  even  though  the  likelihood  is  stationary 
for  a given  6 . Thus,  by  specifying  some  Joint  prior  u^.Oj,  ••*, 

, ...)  for  the  risk  parameter  in  successive  time  periods,  the 
evolutionary  mechanism  of  the  risk  process  is  completely  determined 
[28].  A special  case  of  interest  is  the  one-dimensional  random  shock 
model  of  Gerber  and  Jones  [15],  [16],  [29]  in  which  the  evolutionary 
mechanism  provides  a sequence  of  mutually  independent  scale  and  location 
shifts  {kt  , st)  to  the  location  parameters  of  the  risk  variables, 
so  that: 

(8.5)  E{xt  | * mt(0t'  “ ktmt-l*6t-l*  + 8t  * (t  * 1»2»  •••) 


where  8t  and  ^u»su  I u “ t,t+l,  •••)  are  mutually  independent. 
Forecasts  for  successive  time  periods  follow  a simple  recursive  cal- 
culation scheme;  in  many  cases  where  the  moments  of  k^  , st  are 
stable,  the  credibility  weights  are  ultimately  of  geometric  form. 

In  [ 29] , the  author  explores  in  detail  the  "good"  forms  of  C 
and  b in  (4.5)  which  lead  to  forecasts  in  either  closed  or  recursive 
form. 


8.6  Conditional  Distributions 


An  obvious  criticism  of  the  multi-dimensional  formula  (6.4) 
is  that  in  estimating  the  future  value  of  a selected  component,  say 
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xs  , all  of  the  remaining  data  is  used  in  a linear  approximation. 

If  one  could  easily  calculate  the  s^  conditional  density 

pg(x#t  | xt;6)  , (x^  is  without  xgt)  then  one  could  use  just  a 

one-dimensional  linear  approximation  in  terms  of  [x  , ,x  x ] ; 

si  si  sn 

now,  however,  the  mean  and  variance  components  of  this  conditional 
density  are  quite  complex,  and  are  time-varying  in  the  sense  that  differ- 
ent value  of  xt  must  be  substituted.  A complete  development  is 
given  in  [28].  As  might  be  expected,  simplification  occurs  only  if 
the  conditional  mean  is  of  separable  type. 

An  important  special  model  of  this  type  has  been  analyzed  by 
Biihlmann  and  Straub  [6],  [10],  in  which  x^t  is  the  claim  rate  (total 
portfolio  $ claims  per  unit  volume  of  business),  and  X2t  is  the 
given  volume  of  business  in  year  t . By  elementary  assumptions: 

(8.6)  E{iu  | x2t;6}  - m0(6)  ; Wx^  | x2t;0}  - vQ(0)/x2t  , 

where  b^(0)  and  v^(0)  are  moments  associated  with  a single  unit 
volume.  The  credibility  forecast  is  bilinear  in  (total  $ 

claims)  and  x2t  (volume)  and  uses  the  latter  as  operational  time 
for  credibility,  rather  than  n . 


8.7  Minimax  Credibility 

The  use  of  other  than  quadratic  error  norms  seems  mathematically 
intractable  in  approximating  unknown  conditional  means.  An  interesting 
model  by  Biihlmann  and  Marazzl  [ 8 ] adapts  the  point  of  view  that  Nature 
(who  picks  m , D , and  E)  is  playing  a game  against  the  actuary; 
the  resulting  strategies  depend  strongly  on  the  assumptions  about  the 
regions  of  play  open  to  Nature. 
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9.  COLLATERAL  DATA  AND  HIERARCHIAL  MODELS 

Suppose  that  the  different  dimensions  of  x refer  to  p different 

individual  risks,  each  with  a different  risk  parameter,  0^  , 

(*•  “ 1*2,  . ..,  p)  , independently  distributed  according  to  the  same  prior 

density,  u(0)  . In  this  case,  it  is  easy  to  see  that  E , D , and  Z of 

Section  6 are  diagonal,  and  in  predicting  E{x  ..  | X}  for  the  sth  risk, 

8 p n • x 

the  experience  data  {xlt  | (i  i s)  (t  - 1,2,  ....  n)}  from  the  other 
members  of  the  cohort  is  not  used.  In  other  words,  the  one-dimensional 
form  (2.1),  mixing  m and  the  s^  component  of  x , is  optimal. 

This  result  is  disturbing  to  many  practitioners,  who  feel  that  data 
from  other  risks  in  the  same  portfolio  contains  valuable  collateral  infor- 
mation. Similar  arguments  are  advanced  about  the  use  of  cohort  data  in  the 
otherwise  unrelated  "empirical  Bayes"  approach. 

Biihlmann  and  Straub  [6  ],  [10],  [50]  approach  this  problem  by  using  a 
homogeneous  linear  least-squares  forecast,  in  which  aQ  - 0 in  (4.4),  and 
the  {a^  | j j*  0}  are  selected  so  as  to  minimize  (4.1),  but  constrained  so 
that  the  forecast  is  still  unbiased.  In  the  simplest  credibility  model 
(2.1),  the  collective  prior  mean,  m , is  then  replaced  by  the  estimator: 


(9.1) 


S>(X) 


l 

i-1  t-1 


9 


the  grand  sample  mean  of  all  cohort  data.  This  also  eliminates  the  problem 
of  estimating  m , but  not  that  of  estimating  N . It  also  gives  a larger 
mean-square  error  than  (4.11). 
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la  [30],  the  author  constructs  a heirarchlal  nodal,  In  which  the 
Insurance  company  and  all  risks  In  Its  portfolio  are  characterised  by  an 
additional  hyperparaneter  4 , with  a hyperprior  distribution  over  the 
universe  of  all  such  collectives.  4 is  the  "quality"  of  this  particular 
portfolio. 

Assume  each  Individual  risk  (i  - 1,2,  ...,  p)  has  first  and  second 
moments 

(9.2)  m^,*)  - E{xit  | 0^4)  ; v01#4)  - V(xlt  | 0^4)  . 

Prior  Information  now  consists  of  an  universal  mean  over  all  collectives, 

(9.3)  M - EEm(0,4) 
and  three  components  of  universal  variance : 

(9.4)  F - EEv(0,4)  ; G - EUm(0,4)  ; H ■ l/Em(0,4)  . 

(In  the  above  expressions,  the  inner  operation  is  on  0 , with  4 fixed, 
and  the  outer  operation  is  on  4 .)  F and  G correspond  to  the  two  terms 
in  (3.4),  averaged  over  all  possible  collectives,  while  H is  a new  term, 
corresponding  to  inter-portfolio  variation.  Presumably,  one  could  easily 
estimate  M , F , G , and  H from  a nationwide  bank  of  experience  data 
from  different  insurance  companies.  In  predicting  xg  , we  now  have  to 
"learn"  simultaneously  about  both  0g  and  4 , and  all  collateral  data 
from  the  portfolio  will  be  used  since  all  risks  have  the  same  4 . 

The  optimal  estimator  then  consists  of  three  terms: 

(9.5)  E{i-n+1  | X}  - (1  - Z) [(1  - Zc)M  + Zci(X)]  + Z J (xgt/n)  • 
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The  individual  credibility  factor , Z , is  given  by  (2.2),  but  in  place  of 
(4.10),  N - F/G  . 

The  term  in  square  brackets  represents  the  best  current  estimate  of 
the  fair  premium  for  "our"  collective;  it  mixes  the  universal  mean,  M , 
with  the  grand  sample  mean  (9.1),  using  a collective  credibility  factor 


(9.6) 


_ _ npH 

c " F + nG  + npH 


Note  that  Z£  < 1 as  n -*•  « ; that  is,  m(X)  is  not  ultimately  "fully 
credible"  for  the  fair  premium  of  "our"  collective;  this  is  because  ve  have 
only  a finite  sample  of  for  a fixed  $ . 

If  H -*■  0 , (9.2)  reduces  to  the  usual  credibility  formula.  Buhlmann 
and  Straub's  result  can  be  seen  as  a limiting  case  in  which  H -*•  • ; that 
is,  we  have  a "diffuse  (hyper-)  prior"  on  4 , and  inter-portfolio  vari- 
ations are  very  large. 


10.  BAYESIAN  REGRESSION  MODELS 

A very  general  mathematical  structure  which  contains  all  of  the 
prevloua  models  Is  the  linear  (regression)  model 

(10.1)  £ - HB  + u , 

where  £ and  u are  n * 1 random  vectors  of  observable  output  variables 
and  unobservable  error  variables,  respectively,  H is  a known  n * p 
design  matrix,  and  £ is  a pal  random  vector  of  unknown  regression 
parameters ; we  aasume  that  a prior  joint  density  of  (£,u)  is  known 
Given  an  observation  £ • y , the  problem  is  to  draw  poster ior-to-data 
Inferences  about  £ , or  about  future  values  of  £ for  some  (possibly) 
different  design  matrix;  this  is  a problem  in  Bayesian  regression . a 
complete  Bayesian  regression  analysis  is  very  difficult,  usually  requiring 
restrictive  distributional  assumptions  or  complicated  algebraic  manipulations 
(see,  e.g.  1 54]). 

However,  the  linearized  approach  of  credibility  theory  can  be  very 
useful  if  the  goal  is  to  update  only  mean  values  of  6,  or  £ ; prepos- 
ter lor  error  covariances  can  also  be  determined. 

Let  the  prior  knowledge  of  {jB,u}  be  summarized  in  the  mean  vectors: 

(10.2)  E{£}  - b ; E(u  | £}  - 0 (for  all  B)  ; 
and  the  covariance  matrices: 

(10.3)  V{£}  - A ; EV(i  | B)  - V(u>  - E ; 

of  order  p x p and  n x n , respectively.  He  define  also  alternate- 
dimension  versions  of  the  covariances: 


(10.4)  D - HAH’  ; e - (H'E”^H)_1  ; 


which  are  n * n and  p * p , respectively.  Even  If  E la  positive 
definite  (most  applications  have  E diagonal,  l.e.,  "hoaoscedastlc  errors," 
or  "white  noise"),  c may  not  exist  In  aany  models  of  Interest  because 
H is  not  of  rank  aax  (p,n)  . 

There  are  two  versions  of  the  credibility  forecast  of  the  aean  para- 
meter values  f. (z.)  * E{£  | . In  the  first  version: 

(10.5)  Hz)  - (I  - ZH)b  + Zz  . 

where  1 is  the  p x p unit  matrix,  and  Z Is  a p x n credibility  matrix 


(10.6) 


Z - AH' (E  + D)"1  . 


This  clearly  exists  if,  say,  E is  positive  definite,  and  H contains 
only  nonnegative  elements;  an  n * n Inversion  is  required,  even  if  E * 
Is  known,  hence  this  form  Is  suitable  for  limited-observation  experiments 
where  n < p . 

In  the  second  version: 

(10.7)  Hz)  - (I  - s)b  + *!(*)  , 

where  0(y)  is  the  classical  (generalised)  least-squares  estimator  of  6, 

(10.8)  £(jr)  - eH’E^x  - (H'E^r^'E^y  , 

and  s Is  a p * p credibility  matrix: 

(10.9)  s - (I  + eA"1)"1  - A(I  + c“1A)'1e"1  . 

This  matrix  is  analogous  to  the  usual  multidimensional  credibility 
matrix  (6.5)  with  "one"  sample,  and  show  clearly  the  mixing  of  the  prior 
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mean  and  the  classical  estimator.  Moreover,  in  many  applications  n > p , 
and  (10.9)  shows  that  only  one  p * p inversion  is  required  to  find  £(2.)  , 
if  E * is  known,  greatly  reducing  the  computational  labor.  On  the  other 
hand,  to  find  £(jr)  explicitly,  we  require  that  c exist,  which  leads  to 
the  classic  problem  of  "ldentlflabillty,''  and  the  requirement  that 
rank  (H)  - p . 

The  preposterior  covariance  of  the  parameter  estimation  error  can  be 
shown  to  be: 

(10.10)  - f(£)}  - (I  - ZH)A  - (I  - z)A  - (A-1  + e"1)"1  . 

Cf.  (7.10). 

Hachemelster  [19]  and  Taylor  [SI]  were  the  first  to  give  special 
versions  of  (10.7),  (10.8)  in  credibility  terminology.  Many  other  results 
and  interpretations  can  be  found  in  [34]. 

However  there  are  numerous  nonBayesian  versions  of  the  above  formulae 
in  earlier  statistical  literature  [46],  [ 52] . Priority  for  these  formulae 
probably  belongs  in  the  communications  theory  literature,  where  generalised 
least-squares  and  "paeudo-Bayes"  estimators  have  been  used  in  linear 
(Wiener-Kalman-Bucy ) filters  for  many  years,  (see,  e.g.  [47],  pp.  182-4); 
specialized  jargon  of  this  field  has  no  doubt  delayed  recognition  of  the 
slmularltles  between  approaches.  Also,  filter  theory  emphasizes  dynamic 
regression  models,  with  recursive  calculation  of  successive  forecasts  to 
reduce  computational  labor.  An  example  of  this  approach  for  simple  trends 
in  regression  parameters  is  given  in  [33]. 


11.  OTHER  APPLICATIONS 


In  closing,  we  describe  several  noninsurance  applications  of  credibili- 
ty theory. 


?llng  Schemes 


Classical  statistics  treats  in  great  detail  the  analysis  of  variance 
for  complex  sampling  schemes.  However,  in  many  real  applications,  one  has 
also  a prior  estimate  of  the  quantity  being  measured,  and  the  experimental 
observations  should  be  combined  optimally  with  this  prior  knowledge.  [37] 
explores  this  idea  for  various  nested  sampling  schemes;  as  might  be  expected 
from  Section  7,  the  prior  mean  value  is  combined  with  the  classical  estimators 
in  a manner  proportional  to  their  relative  precisions. 


11.2  Material  Accountability  Systems 


The  rapid  proliferation  of  nuclear  material  has  induced  development  of 
statistical  material  accountability  systems  to  monitor  and  "safeguard"  the 
production,  storage,  and  shipment  processes.  The  basic  tool  is  a material 
balance  equation,  which  should  balance  out  to  zero  if  the  material  un- 
accounted for  is  also  zero;  however,  there  are  very  difficult  Instrumentation 
problems  in  measuring  radioactive  materials,  and  this  balance  can  only  be 


estimated  statistically.  [32]  describes  a simple  batch  material  balance 
closure  problem.  More  general  dynamic  multi-stage  problems  can  be  tackled 
using  the  formulae  of  Section  10. 


11.3  Instrument  Calibration  and  Measurement  and  Inverse  Regression 
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(11*1)  y - 6q  + BjX  + u . 

In  calibration , relatively  precise  standard  inputs  x^x^  ...  are  given 
to  an  uncalibrated  instrument,  resulting  in  outputs  1^*7 2»  •••  which 

contain  observation  errors  u^Uj We  generally  have  some 

joint  prior  information  about  the  instrument  parameters  , and 

this  is  updated  using  the  calibration  data. 

In  measurement , we  place  a partially  known  quantity  x^  as  input 
(with  some  prior  on  its  value),  and  observe  yQ  - yQ  ; the  problem  is  then 
to  make  an  inverse  regression  to  estimate  xQ  . 

Clearly,  the  general  Bayesian  formulation  is  to  find 

E{i0  I y0;(xl*yl);(x2»y2);  * ln  t31l»  we  8how  that  the  U8e  of  credibili- 
ty breaks  the  estimation  into  two  natural  stages:  (1)  a credibility  esti- 

mation of  (Bq,Bi)  using  the  standards;  (2)  a linearized  Inverse  regression 
using  the  new  parameter  estimates. 

11.4  Network  Flows 

Many  road  traffic,  telecommunication,  and  accounting  processes  can  be 
modeled  as  flows  over  networks ; the  basic  equation  is  Kirchoff's  conservation 
law,  in  which  the  design  matrix  H of  Section  10  is  the  node-arc  incidence 
matrix.  In  one  formulation,  there  is  prior  knowledge  about  arc  flows  and 
their  prior  precisions  (usually  highly  correlated  because  they  arise  from 
origin/destination  "path"  flows);  the  problem  is  to  make  a few  boundary  or 
selected-arc  flow  measurements,  and  to  infer  the  new  arc  flows.  This  prob- 
lem is  highly  "unidentifiable"  ln  the  classic  sense  because  the  rank  of  H 
is  one  less  than  the  number  of  nodes;  however,  the  theory  ln  (10.5)  and 
(10.6)  still  applies  [35]. 


The  usual  manner  of  Inferring  lifetime  distributions  in  reliability 
studies  is  to  place  a large  number  of  components,  N , on  a test  stand,  and 
run  them  for  a fixed  time  T until  C ■ C(T)  < < N of  the  components  have 


failed.  Assuming  prior  knowledge  about  the  distribution  parameters,  the 
problem  is  to  make  a posterior- to- test  inference  using  C completed  life- 
times [x^  • xA  | i - 1,2 C]  and  N - C incomplete  lifetimes 

[xA  > T | i - C+l.C+2,  ....  N]  . 

[36]  examines  the  proportional  hazard  lifetime  distribution: 

(11.2)  Pr  {x  > x | 0}  - exp  {-0Q(x)}  , 

where  Q is  a known  prototype  failure  function , and  0 has  a Gamma  prior; 
it  is  shown  that  the  Bayesian  estimate  of  0 * from  this  test  is  exactly  a 
credibility  mixture  of  the  prior  estimate  of  0 ^ together  with  a new 
maximum  likelihood  estimator: 

Q 

Ui. 3)  £ I Q(xt)  + (f  - l)Q(T)  , 

which  generalizes  the  well-known  total-time-on-test  statistic. 
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